Exponential Language Models , Logistic Regression
نویسندگان
چکیده
In this paper, we modify the traditional trigram model by using utterance-level semantic coherence features in an exponential model. The semantic coherence features are collected by measuring the correlations among content-word pairs occurring in sentences of two corpora , the real corpus and a corpus generated by the baseline trigram model. The measure we use for estimating the semantic association of content word pairs is Yule's Q statistic. For our preliminary analysis, we have further simpliied the modeling task by extracting a small set of statistics from each sentence-based Q statistics and applying them as features to the exponential model. We also simpliied the process of obtaining the MLE solutions of the exponential models by approximating it with a logistic regression model. We account for the uncertainty in the estimates of Q by constructing conndence intervals. The new model results in a slight reduction in test-set perplexity. We also discuss and compare alternative measures of associaztion, such as statistics.
منابع مشابه
Interactive Feature Induction and Logistic Regression for Whole Sentence Exponential Language Models
Whole sentence exponential language models directly model the probability of an entire sentence using arbitrary computable properties of that sentence. We present an interactive methodology for feature induction, and demonstrate it in the simple but common case of a trigram baseline, focusing on features that capture the linguistic notion of semantic coherence. We then show how parametric regre...
متن کاملInteractive Feature Induction and Logistic Regression for Whole Sentence Exponential Language
Whole sentence exponential language models directly model the probability of an entire sentence using arbitrary computable properties of that sentence. We present an interactive methodology for feature induction, and demonstrate it in the simple but common case of a trigram baseline, focusing on features that capture the linguistic notion of semantic coherence. We then show how parametric regre...
متن کاملPrediction of unwanted pregnancies using logistic regression, probit regression and discriminant analysis
Background: Unwanted pregnancy not intended by at least one of the parents has undesirable consequences for the family and the society. In the present study, three classification models were used and compared to predict unwanted pregnancies in an urban population. Methods : In this cross-sectional study, 887 pregnant mothers referring to health centers in Khorramabad, Iran, in 2012 were ...
متن کاملبرازش مدل الگوی رشد طبق آفتابگردان ارقام لاکومکا و پروگرس در شرایط دیم
Considering the importance of sunflower as one of the most important plants in the production of edible oils, present study was developed in order to determine the best nonlinear regression function which can quantify growth of diameter of sunflower head to time. At the present study in order to fit the best regression model explaining relationship between increasing of sunflower head diameter ...
متن کاملA SEGMENTED REGRESSION MODEL FOR DESCRIPTION OF MICROBIAL GROWTH
A segmented regression model for the description of microbial growth has been suggested. The model is able to predict the exponential growth, logistic growth, logistic growth with a phase of decline, diauxic growth, microbial growth in synchronous cultures and the oscillatory growth
متن کامل